Introduction

This project provides a Java implementation for estimating properties of complex biochemical mixtures that consist of linear chains made of a small set of building blocs. This package originated with a publication on the characterization of Heparan Sulfate (HS).

“Combining measurements to estimate properties and characterization extent of complex biochemical mixtures; applications to Heparan Sulfate”. J.R. Pradines, D. Beccati, M. Lech, J. Ozug, V. Farutin, Y. Huang, N.S. Gunay & I. Capila. Scientific Reports volume 6, Article number: 24829 (2016) [full text].

Bovine Kidney Heparan sulfate consists of oriented linear chains, from non-reducing to reducing end, that are made of at least 13 building blocs. In the above referenced publication, building blocks are divided in two categories depending on their sulfation status: sulfated (S) and unsulfated (U).

Two questions are answered in the paper:

  • Is the average prortion of S the same at each position from the non-reducing end?
    • If yes, this is called Homogeneity (H).
    • If no, this is called Nonhomogeneity (N).
  • Is the S/U identity of a block at position \(i\) influencing the identity at position \(i+1\)?
    • If yes, this is called Correlation (C).
    • If no, this is called Independence (I).

Therefore, there are four possible ways to model the mixture of chains.

Model Name Main Class
H&I Homogeneity & Independence HIModel
H&C Homogeneity & Correlation HCModel
N&I Non-homogeneity & Independence NIModel
N&C Nonhomogeneity & Correlation MaxEntModel

It is shown that the first 3 models cannot reproduce experimental data on the mixture, therefore maximum-entropy modeling (least-assumption modeling) is used to estimate Nonhomogeneity and Correlation.

Mathematical overview

Optimization methods.

Models are tested for their fit to experimental constraints with a few different optimization techniques.

  • Linear Programming (LP) with the Simplex Method
    • Feasibility analysis is used for testing a model fit given experimental measurements formulated as linear constraints.
    • Upper and lower possible bounds for some mixture properties are estimated with LP while preserving the experimental constraints.
    • See classes AVSFormCons, HCModelLIC, LineqCons, NIModelLIC and SFormCons for the representation of linear constraints.
    • LP is implemented with the combination of classes Simplex, SimplexPhaseI and Tableau.
    • See class GradientWidth for the derivation of bounds.
  • Non Negative Quadratic Programming (NNQP)
    • Coodinate descent is utilized to perform projection onto a polyhedral set. This enables the utilization of simulated annealing to optimize a problem with non-convex objective function while preserving linear constraints that represent a Markov model (H&C) or a profile of independent proportions (N&I).
    • See classes ProjOnPolyHSet and CoordDescNNQP.
  • Maximum Entropy Modeling (MaxEnt)
    • MaxEnt modeling is solved via its dual problem, which is here an unconstrained geometric program in convex form. This geometric program is solved with Newton’s method.
    • See classes MaxEntModel, MaxEntOptim and MaxEntModelW.
  • Simulated Annealing:
    • Simulated annealing is used as a heuristic to minimize non-convex objective functions subject to linear constraints. Constraints are enforced at each perturbation via projection (NNQP).
    • Relevant classes are HCModelSA and NIModelSA.

Derivation of constraints

Mathematical work that yields expression of constraints and formulation of the optimization problem is contained in the publication supplementary material. This work relies on probability calculus to obtain expressions of constraints. Below are links to pages of the supplementary material for each model.

Summary table

The following table lists Java classes, their corresponding model, their main role, links to their javadoc and links to their page in the supplementary material. Note that the table is interactive: rows can be sorted by column values and the search box acts as a filter.

View the API Javadoc

Integration with R (rJava)

While the package is designed with Java, using it via R is possible with package rJava. Below is an example.

library(rJava)
# print(getwd())
jar_path <- normalizePath("../target/heparan-sulfate-1.0-SNAPSHOT-jar-with-dependencies.jar", mustWork = TRUE)
.jinit(classpath = jar_path)
input_dir <- paste0(normalizePath("../input/", mustWork = TRUE), "/")
summary_text <- .jcall("heparansulfate.CSpec", "Ljava/lang/String;", "getHeparanSulfateSpecSummary", input_dir)
cat(summary_text)
## Heparinase I Specificities:
## 1.0
## 0.03
## 
## Heparinase III Specificities:
## 0.033
## 1.0

For further examples see the following document which shows how to derive the figures of the publication.

References

  • Pradines, J. R., et al. (2016). “Combining measurements to estimate properties and characterization extent of complex biochemical mixtures; applications to Heparan Sulfate.” Scientific Reports, 6, 24829. DOI: 10.1038/srep24829.
  • Urbanek, S. (2021). rJava: Low-Level R to Java Interface. R package version 1.0-6.
  • Xie, Y., Cheng, J., & Tan, X. (2024). DT: An R Interface to the ‘DataTables’ JavaScript Library. R package version 0.33.